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ABSTRACT 

Using eigenmode expansion of the Mark III and SFI surveys of cosmological 
radial velocities a goodness-of-fit analysis is applied on a mode-by-mode basis. 
This differential analysis complements the Bayesian maximum likelihood 
analysis that finds the most probable model given the data. Analyzing the 
surveys with their corresponding most likely models from the CMB-like family 
of models, as well as with the currently popular A-CDM model, reveals a 
systematic inconsistency of the data with these 'best' models. There is a 
systematic trend of the cumulative x 2 to increase with the mode number (where 
the modes are sorted by decreasing order of the eigenvalues). This corresponds 
to a decrease of the x 2 with the variance associated with a mode, and hence 
with its effective scale. It follows that the differential analysis finds that on 
small (large) scales the global analysis of all the modes 'puts' less (more) power 
than actually required by the data. This observed trend might indicate one 
of the followings: a. The theoretical model (i.e. power spectrum) or the error 
model (or both) have an excess of power on large scales; b. Velocity bias; c. The 
velocity data suffers from still uncorrected systematic errors. 

Subject headings: large scale structure, radial velocities 
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1. Introduction 

Surveys of radial velocities of galaxies have played a major role in the study of the large 
scale structure. The analysis of such surveys has been conducted in two main directions, 
the mapping of the local cosmography and the estimation of the cosmological parameters 
(cf. Dekel 1994 for a review). The Bayesian framework provides one with very elegant 
and powerful tools for conducting both the mapping and parameter estimation, where 
the recovery of the large scale structure is done by means of the Wiener filter and the 
parameters are estimated by maximum likelihood (MaxLike) analysis (Zaroubi et al. 1995, 
hereafter ZHFL). In the case where the deviations from a homogeneous and isotropic 
universe constitute a Gaussian random field the Wiener filter and the MaxLike are the 
optimal tools for performing such an analysis (ZHFL). Indeed, the MARK III catalog of 
radial velocities (Willick et al. 1995, 1996, 1997a) have been recently analyzed by Wiener 
filtering (Zaroubi, Hoffman and Dekel 1999) and by MaxLike (Zaroubi et al. 1997). The 
SFI survey of da Costa et al. (1996) has been studied by MaxLike analysis by Freudling 
et al. (1999) and by Wiener filtering (Hoffman and Zaroubi, unpublished). Both surveys 
seem to yield similar results. 

In the Bayesian MaxLike analysis one calculates the posterior probability of a model 
to be correct given the data (ZHFL, Vogeley and Szalay 1996). Thus the model that 
maximizes the likelihood function, over a given parameter (or model) space, is the most 
likely model in that space. The MaxLike analysis cannot guarantee, however, that the most 
probable model is indeed consistent with the data. It provides only a relative measure for 
models to be correct. It is common to adopt an independent measure for the goodness-of-fit, 
which is often given by the requirement that the reduced x 2 is close to unity. Often, when 
the most likely model (given the data) passes also the goodness-of-fit test one assumes that 
the 'correct' model has been nailed down. Here, the x 2 test is expanded and a much more 
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critical test is suggested and then applied to the Mark III and SFI surveys. 

The x 2 'goodness-of-fit' is based on the assumptions that all the random variables that 
affect the observables are normally distributed. In the cosmological context this applies to 
both the underlying dynamical (e.g. density and velocity) field and the statistical errors. 
Thus for a survey containing N data observables (e.g. radial velocities) the \ 2 °f the system 
of N degrees of freedom (DOF) is calculated given the model that maximizes the likelihood 
function. The goodness-of-fit is measured by how close is the x 2 /DOF to unity. This 
provides a global measure for the consistency of the data with the model, as it includes all 
the observables. A situation might occur of some 'conspiracy' where different parts of the 
data deviate from the predictions of the model, but when combined together they 'conspire' 
to yield a reasonable x 2 - A much stronger test on the model is to decompose the data 
into statistical independent eigenmodes and observe the \ 2 behavior of the independent 
modes. Eigenmode analysis, also known as principal component analysis (PCA) and the 
Karhunen-Loeve transform, is not a new tool in the field. It has been applied to studies of 
redshift surveys (Vogeley and Szalay 1996), the cosmic microwave background (Bunn 1997, 
Bond 1995) and more recently radial velocities surveys (Hoffman, 1999). The later study 
is extended here to perform the 'goodness-of-fit' test on a mode-by-mode basis. The basic 
formalism is presented in § |2|, and its application to the Mark III and SFI surveys is given 
in § [3]. Our results are discussed and the conclusions are summarized in § [|. 



2. Eigenmode Analysis of Radial Velocities 



Consider a data base of radial velocities {iit}i=i jv, where 

Ui = v(rj) -ii + ei, 



(1) 
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v is the three dimensional velocity, r, is the position of the i-th data point and 6j is the 
statistical error associated with the i-th radial velocity. The assumption made here is of a 
cosmological model that well describes the data, that systematic errors have been properly 
dealt with and that the statistical errors are well understood. The data auto-covariance 
matrix is then written as: 



(Here (■ ■ ■/ denotes an ensemble average.) The last term is the error covariance matrix. The 

velocity covariance tensor that enters this equation was derived by Gorski (1988, see also 
Zaroubi, Hoffman and Dekel 1999) and it depends on the power spectrum and cosmological 
parameters. 

The eigenmodes of the data covariance matrix provides a natural representation of the 
data: 



The set of N eigenmodes {rj^'} constitutes an orthonormal basis and the eigenvalues Aj are 
arranged in decreasing order (in absolute values). A new representation of the data is given 




(2) 




(3) 



by: 



a,i = 7]) Uj 



(4) 



This provides a statistical orthogonal representation, namely: 





(5) 



The normalized transformed variables are defined by: 




(6) 



Eq. H is written now as: 




(7) 
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Note that as the modes are statistically independent one can measure the x 2 of a given 
mode, xl — a h an d the cumulative reduced x 2 is given by:, 

1 M 

X 2 m = ^E^ (8) 

i=l 

For normally distributed errors and a Gaussian random velocity field the aj's are normally 
distributed with zero mean and a variance of unity. 

In addition the probability of finding such Xm is calculated as well. The probability is 
defined by 

P (xm) = PA M Xm,M) for P x i(M X 2 m,M) < 0.5 

= 1- P x 2(Mx 2 M ,M) otherwise, (9) 

where P x 2 (x, M) is the probability that a random variable drawn from a x 2 distribution 
with M degrees of freedom is less than a given value x. 

3. Differential x 2 Analysis 

Here the goodness-of-fit of the Mark III and SFI surveys is studied. The models 
studied here are the MaxLike solutions for these surveys, which are slightly different 
from one another. The most likely model given Mark III is a tilted-CDM (T-CDM) of 
fi = 1, h — 0.75 and n = 0.8 where fi is the cosmological density parameter, h is Hubble's 
constant in units of 100 km/s/Mpc and n is the power spectrum index (Zaroubi et al. 
1977). The most likely model given SFI is an open CDM (OCDM) of fi = 0.79, h = 0.6 
and n = 0.92 (Fruedling et al. 1999). For both cases the MaxLike best model has a total 
Xm=n verv close to unity. Thus, from the point of view of the integral x 2 the MaxLike 
solutions seem to be very consistent with the data. This is extended to perform a differential 
X 2 analysis, namely to study the x 2 behavior across the modes spectrum. 
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To study the robustness of this probe it is first applied to a linear mock catalog of 
Mark III, constructed from an unconstrained realization of the velocity field. This field is 
sampled at the location of the Mark III data points, to which normally distributed errors 
are added according to Mark Ill's error covariance matrix. The cumulative x 2 °f such a 
catalog should oscillate around unity, given that the model used to generate the catalog 
is known. Indeed, this has been confirmed by an analysis of a few linear mock catalogs of 
Mark III. The probabilities of obtaining such x 2 distribution lies comfortably within the 
90% confidence level. The non-trivial result of this test is that the very poor sampling of 
the long wavelength Fourier waves, i.e. cosmic variance, does not affect the goodness-of-fit 
test. 

The differential x 2&n d its associated probability of the Mark III and SFI surveys are 
presented in Fig. [l|, each case analyzed in its maximum likelihood solution. A clear trend 
is noticed, namely over almost the entire mode spectrum the cumulative x 2 increases 
monotonically. When all modes are included the total x 2 /DOF is indeed close to 1, but 
if we had to take half the modes, starting from the top or the bottom, a very different \ 2 
would have obtained. 

The differential \ 2 analysis is repeated for the currently popular model of A- 
CDM(^o — 0.4, h = 0.6 and n = 1; Fig. p. Indeed, the same trend is found in this case as 
well but the total x 2 converges to a value outside the 90% confidence level. 

The conclusions that follows is that for both data sets, Mark III and SFI, and for 
a variety of theoretical models the differential x 2 increase monotonically with the mode 
number (with the exception of the first 10 modes of the Mark III). The theoretical 
expectation is that if indeed the data is consistent with the assumed model then x\i w iU 
fluctuate around unity. The probability of observing such a trend given a model is very 
small across most of the mode number range. 
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Fig. 1. — The cumulative y 2 (left) and the probability of this x 2 distribution (right) of the 
Mark III (solid line) and SFI (dashed line) surveys are plotted against the mode number. The 
model used here are the tilted-CDM (Q — 1, h — 0.75 and n = 0.8; Mark III) and the open 
CDM (n = 0.79, h = 0.6 and n = 0.92; SFI) The lower and upper 90% confidence levels 
are superimposed on the left figure (dotted lines). The modes are arranged by decreasing 
order. 
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4. Discussion 

What have we learned from the differential x 2 analysis? It has been found that even 
the most probable CDM-like model, the one that maximizes the likelihood function given 
the data, is not fully consistent with the data. The cumulative x 2 has been calculated 
both downwards and upwards (namely starting from the modes with the largest and 
smallest eigenmode, respectively). Over more than 90% of the modes the cumulative x 2 
lies well outside the 90% confidence level, indicating a very small probability of measuring 
such data given the assumed model. Over most of the mode number range Xm increases 
monotonically. It is this behavior of the x 2 which indicates a systematic inconsistency of the 
model with the data. The assumed model actually contains two ingredient, the theoretical 
power spectrum and the error model. However, the present analysis cannot indicate which 
one is to be 'blamed' for the systematic trend. It should be noted here that apart from the 
first few (10 — 20) modes there is a clear correlation of the eigenvalues with its weighted 
mean distance (of data points of the given mode). Namely, the variance associated with a 
mode (i.e. its eigenvalue) increases with its mean distance (Zehavi, private communication, 
Silverman et al. in preparation). It follows that the x 2 trend seen here is closely correlated 
with the distance and that the data 'asks' for less power on large scales than the model 
(power spectrum and noise) provides. A detailed study of the power spectrum and error 
model possible modifications is to be given elsewhere (Silverman et al. in preparation). 
(Note that these first 10 — 20 modes are the ones dominated by the underlying velocity field 
and not the noise, Hoffman 1999.) 

The cosmological implications of the present findings are that either the error and/or 
the theoretical model need to be modified. The theoretical model assumed in the analysis of 
large scale radial velocity surveys is that the velocities are drawn from a Gaussian random 
field defined by a given power spectrum. The present study might indicate the inconsistency 
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of the power spectrum with the data. A less likely possibility is that it indicates a departure 
from the Gaussian statistics. Alternatively, the present work might indicate a systematic 
error that has not been accounted for that causes this trend. Still another possibility is that 
of an indication for a velocity bias. 

The conclusions reached here should not be taken as a contradiction of the results 
of Zaroubi et al. (1997) and Fruedling et al. (1999), but rather as extending and 
complementing them. The Bayesian MaxLike analysis can be performed only within the 
assumed parameter/model space. The differential x 2 allows one to go beyond this and 
analyze the nature of the agreement, or the lack of it, between a given model and the data 
on a mode by mode basis. 

The PCA transforms the data to a statistically independent representation and enables 
the study of the compatibility of the data with the model on a mode by mode basis. This 
differential analysis is in contrast to the more traditional approach where a data set is 
analyzed as a whole. The differential x 2 analysis should be performed together with the 
Bayesian MaxLike analyze and complement it. This should be useful in fields where the 
MaxLike is the basic tool of analysis such as the mapping of the CMB angular fluctuations 
and the study of redshift surveys as well as all radial velocity surveys. The present analysis 
can prove to be very useful and powerful in those fields where systematic errors play a 
crucial roles, such as redshift and radial velocities surveys. 

We have benefited from many interesting discussions with Avishai Dekel, Zafrir Kolatt, 
Ofer Lahav, Lior Silverman, Simon White and Hit Zehavi. The hospitality of the Racah 
Inst. Physics and the Max Planck Institut fur Astrophysik is gratefully acknowledged. This 
research has been partially supported by a Binational Science Foundation grant 94-00185 
and an Israel Science Foundation grant 103/98. 
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Fig. 2. — Same analysis as in ?? applied to the Mark III and SFI survey. The model is a 
A-CDM(f2 = 0.4, h = 0.6 and n = 1) 
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